FST Based Morphological Analyzer for Hindi Language
نویسندگان
چکیده
Hindi being a highly inflectional language, FST (Finite State Transducer) based approach is most efficient for developing a morphological analyzer for this language. The work presented in this paper uses the SFST (Stuttgart Finite State Transducer) tool for generating the FST. A lexicon of root words is created. Rules are then added for generating inflectional and derivational words from these root words. The Morph Analyzer developed was used in a Part Of Speech (POS) Tagger based on Stanford POS Tagger. The system was first trained using a manually tagged corpus and MAXENT (Maximum Entropy) approach of Stanford POS tagger was then used for tagging input sentences. The morphological analyzer gives approximately 97% correct results. POS tagger gives an accuracy of approximately 87% for the sentences that have the words known to the trained model file, and 80% accuracy for the sentences that have the words unknown to the trained model file.
منابع مشابه
Morphological Analyser for Hindi – A Rule Based Implementation
Morphological analysis is an important part of Natural Language Processing. With this, the task of Machine translation becomes very easy. Morphological analyzer can be implemented effectively for the language which is rich in morphemes. Hindi is morphologically rich language. In this paper we focus on the design of a morphological analyzer for Hindi language. The analyzer takes a Hindi sentence...
متن کاملHindi Derivational Morphological Analyzer
Hindi is an Indian language which is relatively rich in morphology. A few morphological analyzers of this language have been developed. However, they give only inflectional analysis of the language. In this paper, we present our Hindi derivational morphological analyzer. Our algorithm upgrades an existing inflectional analyzer to a derivational analyzer and primarily achieves two goals. First, ...
متن کاملContext Based Statistical Morphological Analyzer and its Effect on Hindi Dependency Parsing
This paper revisits the work of (Malladi and Mannem, 2013) which focused on building a Statistical Morphological Analyzer (SMA) for Hindi and compares the performance of SMA with other existing statistical analyzer, Morfette. We shall evaluate SMA in various experiment scenarios and look at how it performs for unseen words. The later part of the paper presents the effect of the predicted morph ...
متن کاملStatistical Morphological Analyzer for Hindi
Morphology is the study of internal structure of words and is an essential early step in many NLP applications such as parsing and machine translation. Researchers working in Hindi NLP have either used the widely popular paradigm based analyzer (PBA) or extensions of it. In this work, we undertook a comprehensive evaluation of PBA using the data from the Hindi Treebank (HTB) and presented a new...
متن کاملHinMA: Distributed Morphology based Hindi Morphological Analyzer
Morphology plays a crucial role in the working of various NLP applications. Whenever we run a spell checker, provide a query term to a web search engine, explore translation or transliteration tools, use online dictionaries or thesauri, or try using text-to-speech or speech recognition applications, morphology works at the back of these applications. We present here a novel computational tool H...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1207.5409 شماره
صفحات -
تاریخ انتشار 2012